This is still a work in progress.

Contents:

Executive summary

Introduction

The purpose of interactive data visualisation

Why the web?

Existing tools:

HTMLWidgets

plotly

rbokeh

ggvis

crosstalk

shiny

A comparison to native interactive tools: iPlots Mondrian rggobi

In brief: * SVG, Canvas * JavaScript, HTML, CSS * The Document Object Model (DOM)


Executive summary

[inprogress]


1 Introduction

The purpose of this report is to investigate current solutions for creating interactive data visualisations in R that can be accessible through the web. Interactive visualisations help inform and explain our data beyond static plots. We aim to identify key similarities and differences between existing tools and find ways to combat these limitations to meet user needs.

The purpose of interactive plots

Interactive plots allow users to explore the data freely. Though it may be presented in a more visually appealing way, it may help explain a topic to a more general audience. As Murray () suggests, static visualisation can only ‘offer precomposed ’views’ of data’, where as interactive plots can provide us with different perspectives. We find that interactive data visualisation is becoming more and more popular, and is generally used in teaching statistics, education, data journalism and is likely to continue to be demanded for in the future.

Why the web?

In modern society, the web has become an easy way of sharing and reaching a wider audience. It has become accessible to everyone without the the user having to worry about installation issues and device compatibility.


2 Existing tools for interactive data visualisation

Existing tools for creating fast web interactive plots in R can generally be classified as a class of R packages known as HTMLwidgets. For tools that do not follow that class, the ggvis and the Shiny package are popular alternatives. These will be discussed along with the limitations they hold…. (may include gridSVG + custom javascript)

HTMLWidgets

An HTMLWidget is an R package that allows users to have access to an existing javascript library through bindings between defined R functions and the JavaScript library (). These HTMLwidgets can serve different purposes depending on what the original javascript library does, such as Highcharter() and rbokeh() that generates plots using the HighCharters.js API and the Bokeh.js API respectively, DataTable() that generates interactive tables, and Leafet() for interactive maps.

The main HTMLwidget package that we have looked at in detail is plotly as it has focused on incorporating interactivity on a wide range of plots and is compatible with r packages Shiny and crosstalk (more details are discussed below).

Plotly

Plotly is a a graphing library that uses the Plotly.JS API that is built upon D3. It is powerful in the sense that it can convert plots rendered in ggplot2 into interactive plots. It provides basic interactivity including tooltips, zooming and panning, selection of points, and subsetting of groups of data through its legend. We can also create and combine plots together, using the subplot() function, allowing users to create facetted plots manually.

Figure: plotly plot of the iris dataset

ggvis

Another common data visualisation package is ggvis(). This package utilises the Vega JavaScript library to render its plots but also uses Shiny to drive some of its interaction(). The plots are based upon the “Grammar of Graphics” and aims to be an interactive visualisation tool for exploratory analysis. This package has an advantage over htmlwidgets, as it also expands upon using statistical functions for plotting, such as layer_model_predictions() for drawing trendlines using statistical modelling. Furthermore, because some of the interactions are driven by Shiny, we can add ‘inputs’ that look similar to Shiny such as sliders and checkboxes to control and filter the plot, but also have the power to add tooltips.

Figure: basic ggvis plot with tooltips

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

Figure: Ability to change a trendline with a slider and filters using ggvis alone

https://github.com/rstudio/ggvis/tree/master/demo/apps

However, we are limited to basic interactivity as we are not able to link layers of plot objects together. Furthermore, ggvis plots are fundamentally slow when it comes to rendering plots with several data points as the DOM cannot handle several SVG elements at once (Chang, 2014). To date, the ggvis package is still under development, with more features to come in the near future.

By considering both plotly and ggvis alone, we find that there these solutions only provide interactive plots quickly to the user with basic functionalities such as tooltips, zoom and pan and subsetting. They do not provide more information about the data, or can be linked to any other plot or statistical analysis. It is hard to customise their interactions as the functions that create these plots are well defined unless we know the original JavaScript API well. ggvis can go further by adding basic user interface options such as filters and sliders to control parts of the plot, but only to a certain extent. Fortunately, interactivity can be extended with these packages by coupling it with Shiny or crosstalk.

Crosstalk

Crosstalk is an add-on package that allows HTMLwidgets to cross-communicate and link together. As Cheng (2016) explains, it is designed to link and co-ordinate different views of the same data(useR Conference 2016). The data is converted into a ‘shared’ object (via V6), which has a corresponding key for each row observation. When selection occurs, crosstalk sends messages between HTMLwidgets to communicate what has been selected and the bounded HTMLwidgets will respond accordingly. This is all happens on the browser, where crosstalk acts as a ‘messenger’ between HTMLwidgets.

## Warning in bscols(widths = 6, plot_ly(shared_iris, x = ~Petal.Length, y =
## ~Petal.Width, : Sum of bscol width units is greater than 12

Example: Linked brushing between two Plotly plots and a data table

Example: Scatterplot matrix linked together with crosstalk

However, crosstalk has several limitations. As Cheng (2016) points out, the current interactions that it only supports are linked brushing and filtering that can only be done on data in a ‘row-observation’ format. This means that it cannot be used on aggregate data such as linking a histogram to a scatterplot. Furthermore, it is currently only supported for a limited number of HTMLwidgets so far - Plotly, dataTable and Leaflet. This is because the implementation of crosstalk is relatively complex. From a developer’s point of view, it requires creating bindings between crosstalk and the HTMLwidget itself and customizing interactions accordingly on how it reacts upon selection and filtering. Despite still being under development, it is promising as other HTMLwidget developers have expressed interest in linking their packages with crosstalk to create more informative visualisations.

Shiny

Shiny is an R package that build web applications through R (RStudio, 2012). It provides a connection of using R as a server and the browser as a client, such that R outputs are rendered on a web page. This allows users to be able to code in R without the need of learning the other main web technologies HTML, CSS and JavaScript. A Shiny app can be viewed links between ‘inputs’ (what is being sent to R whenever the end user interacts with different parts of the page) and ‘outputs’ (what the end user sees on the page) that update whenever an input is changed. There are many different ways to use Shiny to create more interactive data visualisations - we can simply just use Shiny to create interactive plots or extend interactivity in HTMLwidgets and other R packages.

Interactivity with Shiny alone

Shiny can provide some interactivity to plots. Below is an example of some linked brushing on a base plot:

Shiny applications not supported in static R Markdown documents

Example: Linked brushing on a plot from ggplot2

Shiny applications not supported in static R Markdown documents

Example: Facetted ggplot with linked brushing

However, these basic interactive tools only work on base R plots or plots rendered using ggplot2, and work best on scatter plots. This is because the pixel co-ordinates of the plot are correctly mapped to the data(Shiny’s advanced plot interaction article). When we try this on a lattice plot, this mapping condition fails as the co-ordinates system differs between the data and the plot itself.

Shiny applications not supported in static R Markdown documents

Example: Linked brushing on a lattice plot that fails to produce correct mapping

With Shiny alone, we can achieve some basic interactivity along with user interface options that are outside of the plot that can change what we want to see. However, when we wish to drive interactions within a plot, we are limited to simplistic interactions such as brushing and clicking on points. This method only works for plots that are rendered in base R graphics and ggplot2, and cannot be extended onto grid plots or other R plots.

Extending interactivity with HTMLWidgets and Shiny

Although Shiny is great at facilitating interactions from outside of a plot, it is limited in facilitating interactions within a plot. On the other hand, HTMLwidgets are limited with ‘out of plot’ interactions, and have basic plot interactivity embedded. When we combine the two together, we are able to extend and get further.

Shiny applications not supported in static R Markdown documents

Example: a Shiny app with a Plotly plot with linked brushing

  • Because Plotly has been adapted as an HTMLwidget, it’s easy to embed plots into Shiny
  • Plotly can also render plots from ggplot2
  • appears to be much easier to achieve in-plot interactions as well as link things to the plots
  • CRAN documentation
  • A cheatsheet for using Plotly in R
  • You can create faceted plots - need to build each plot before you can put it altogether using the subplot() function
  • you could also build faceted plots from ggplot2, then render with Plotly (using ggplotly())
  • Limitations? It appears that you can’t select points from different ‘curves’ and limited to that subplot. Selection box does not appear correctly (but if you drag over the points you wish to select, the table correctly reports those points corresponding to that plot) - best to reset the plot if you wish to select points from a different subplot.
  • Curve number changes according to whichever plot you refer to (counts from 0 onwards rather than 1), point number refers to the point (row number in the original data).
  • Curve number can also refer to ‘trace’ (if you’ve got a subset in your data - e.g. Gender of males and females, males = 0 and females = 1) > curveNumber: for mutiple traces, information will be returned in a stacked fashion - Plotly in R website
Shiny applications not supported in static R Markdown documents

Example: an example of linked brushing between ggvis plots

  • ggvis has its own set of functions that allow for similar interactions to be achieved.
  • To get something similar to brushPoints(), a few additional lines of code are required: linked object required, and also making the dataset reactive to brushing to link plot brushing to the table.
  • Can incorporate simple ggvis interactions (such as hovers and clicks, sliders)
  • ggvis Interactivity
  • CRAN Documentation

One of Shiny’s advantages is that it establishes a connection to R to allow for statistical computing to occur, while leaving the browser to drive on-plot interactions. However, we are still limited in the sense that for every time we launch a Shiny app, we do not have access to R as it runs that session. Furthermore, in cases of making small changes that do not modify the entire plot (that is, for example, changing the model of a certain trendline but keeping all points the same), Shiny cannot do this as it runs on a mechanism in which it re-renders and updates everything whenever the end user changes an ‘input’. This may lead to unnecessary computations and slows down the process.

Stretching limitations and other tools:

From the above, the interactions that Shiny achieves are not interactions on the plot itself, but rather an interaction driven outside of the plot that causes it to change. With HTMLwidgets and ggvis, we are unable to easily customize our own interactions into the plot such as attach a point to a URL page without expertise in the JavaScript libraries corresponding to these packages. This makes it hard for the user to extend these plots further.

One such example is highlighting only part of a box plot to show certain values between the median and the lower quartile. While this can be easily achieved with gridSVG and custom JavaScript, it is not with the existing tools discussed above.

but the problem is knowing which elements to manipulate, which may require some background knowledge on the javascript library itself


  • Limitations/thoughts: currently works with the plot pre-defined in R, how accurate is gridSVG in translating co-ordinates(?), works best when you’ve got a plot with a consistent naming scheme (for panels + locating elements)

  • If you’re familiar to HTML/CSS/JS, you may be able to extend it further, but may not be recommended
  • Runs on a reactive programming model (which automatically updates everything whenever something changes)
  • Disadvantages: Speed and efficiency, rerunning of code that can slow things down (especially with large datasets)
  • ‘render’ functions are prepackaged, which makes it easy to use, but hard to customise
  • The ability to write JavaScript into Shiny is another way of communicating between R and the browser (using relative JS functions and Shiny functions: Shiny.onInputChange(), session$sendCustomMessage(), observe()) - as a way of not having to render the entire plot again.https://ryouready.wordpress.com/2013/11/20/sending-data-from-client-to-server-and-back-using-shiny/
  • Simplistic interactivity with Shiny via base plots and ggplot2 - linked brushing (based upon a mapping condition between png and the thing, so it’s not ideal in a sense to apply it to a different R-plot such as lattice, grid)
  • Generic compared to crosstalk, allows ‘out-plot’ interactions
  • Can be used with crosstalk (V6 objects)

Another possible reason why it may be hard to prevent re-rendering of plots: - (Not sure if this can be considered an underlying problem?) In all cases of using plotly, ggvis, or even ggplot2, even though the plots generated are ‘layers’, it does not appear possible to isolate a single ‘layer’ and modify it without drawing the entire plot again. (Sometimes when we try to run a single ‘layer’, it draws an entirely different plot… which is not what we want, or complains an error.) You can add on layers, but you always have to refer back to the plot (either through %>%, or storing the plot as a variable). Regardless, Shiny will always(?) manage to rerender the entire plot.

  • What do they achieve that these web tools/HTMLwidgets don’t (without combining)?
  • Linked brushing
  • Zooming in (general purpose)
  • Able to facilitate and handle LARGE datasets
  • Share the same data on different plots (not just scatter plots only - but histograms, bar plots, box plots…)
  • Query selection (subset points) iPlots:
  • iPlots page
  • interactive graphs in R using Java (run through JGR)
  • features: querying, highlighting, color brushing, changing parameters
  • Does it redraw?
  • Possible to add things to a plot via its API?
  • functions for different graphs: imosaic(), ibar(),ipcp() (parallel plots), ibox(), ilines(), ihist()… e.t.c
  • It’s a little old, but it’s pretty great at linking, brushing
  • Downsides: uses Java and JGR (installation was a bit of a hiccup), plots look kind of outdated, no way of connecting plots to the web (native solution!)
  • Similarity to grid/base plots: use of ‘objects’, object lists, and you can easily remove and add plot objects
  • In comparison to all the other packages we’ve been looking at - it’s not available to view on a web browser.
  • Short learning curve (~1 hour or so to learn basic plotting and interactive functions)
  • Use of keyboard shortcuts and mouse keys for some interactive features Mondrian:
  • Main page
  • A possible reading to look into: Interactive Graphics for Data Analysis
  • Another kind of software that uses JAVA programming
  • Looking very similar to iPlots (Mondrian doesn’t require coding, but is rather catered for the end user.)
  • Supports features such as brushing, linking plots together, querying and visualisation of large datasets
  • Possible to import R dataframes for analysis

The interesting part is HOW does Mondrian and iPlots manage to do linking so ‘effortlessly’, and can that be translated onto the web? - might be too hard to tell from source code (unfortunately, I don’t know Java.) Could we find tools that do similar things? - Martin Theus’ home page - His talk on interactive graphics in 2006 - His talk slides on Mondrian in 2008 - More talks slides - Might investigate this more to see if we could make similar in JS/for the web? - Linking a scatterplot to a bar plot Demo - this uses model.js, which is a ‘reactive model library used for data visualisation’ - ^easily achievable in Shiny


findings:

Challenge summary (Boxplot, Trendlines, Arrays): - Shiny is great for anything that requires statistical computation (such as trendlines) as you’ve got a link back to R, and for building a modernized UI (Bootstrap + HTML). - Crosstalk is great for linking plots together, but only present for Plotly and scatterplots. Instead, iPlots has an upperhand with linking capabilities that extend to different kinds of plots. - Plotly, rbokeh, highcharts, ggvis are good for incorporating ‘basic’ interactivity within the plot (especially when it comes to just a single plot - gives you basic information about that plot, points, zoom in, selection, basic stats…etc). It’s more about making an ‘easy’ visual rather than using interactivity to find out more information and gaining more insight. (ie A selection done on the plot doesn’t give you any information about it - does it have outliers? looking at the selected group as a whole? - couple it with Shiny and you’re likely to get a lot further.) iPlots could get you further in terms of being able to return selections of plots. - It’s hard to customise your own on-plot/in-plot interactions in (as found from the boxplot challenge) as most functions have a set event attached to them (or simply: you plug in data (generally in JSON format), and it just gives you a standard plot). These functions were designed to make plotting easy for the user without having to learn web technologies (HTML, CSS, JavaScript). As these JS libraries were originally built for a different program (such as JavaScript, Python, e.t.c), features may be limited (+ possible limitations of the creating an HTMLwidget package, if any). - Simple javascript solutions work well with on-plot interactions that do not require updating. This becomes a challenge when we try to devise a solution that requires updating of co-ordinates (such as manually changing the shape of a trendline), whereas these are easily achieved with Shiny but requires repeated rendering of the entire plot. - The approach during these challenges was to: find out which tool does what best, and then find a way to combine them. In some cases it worked well (as seen in the array challenge), other times it was hopeless (boxplot challenge) simply because the tools didn’t have the capability or required more expertise and investigation.

Conclusion

From our findings, we have established that there is more that can be achieved in expanding interactive graphics to create better data visualisations for users.


References:


Extras: